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Abstract 

The  increasing  computational  demands  of  finite-difference  time-domain  simulations  for 
studying  optical  phenomenon  requires  codes  with  very  high  performance.  This  paper  intro¬ 
duces  a  new  hybrid  parallel  code  using  standard  MPI  and  OpenMP  technologies.  The  code 
is  portable  to  a  variety  of  high  performance  systems.  The  transmission  of  light  through  an 
isolated  sub- wavelength  in  an  optically  thick  silver  film  is  outlined  as  an  example  application 
of  the  code. 


1  Introduction 

The  report  of  enhanced  optical  transmission  through  an  array  of  nano-scale  holes  by  Ebbesen 
et  al.  [1]  in  1998  has  led  to  a  lot  of  interest  in  the  phenomenon  and  its  applications.  Enhanced 
transmission  is  observed  when  the  surface  of  a  thin  film  of  metal  is  patterned  with  some  arrange¬ 
ment  of  periodic  features,  such  as  an  array  of  perforations,  where  each  hole  is  much  smaller  than 
the  wavelength  of  the  incident  light.  Surface  plasmons  have  been  generally  accepted  as  the  waves 
determining  the  behavior  of  the  periodic  structures  at  optical  frequencies  and  much  has  been 
learned  about  their  propagation  and  possible  applications  in  photonic  technologies.  Experiments 
and  theoretical  analyses  have  helped  to  understand  the  nature  of  the  enhanced  light  transmission, 
and  to  provide  some  quantitative  approaches  to  predicting  resonant  wavelengths  for  some,  but 
not  all,  geometries.  It  has  been  pointed  out  by  Garcia-Vidal,  et  al.  [2],  and  is  apparent  from 
the  analyses  published  so  far,  that  “calculations  of  transmittance  spectra  for  a  finite  structure..., 
using  a  realistic  dielectric  function,  are  a  computational  tour  de  force  that  are  beyond  reach  at 
the  moment...”.  Design  and  optimization  of  devices  that  are  relevant  in  practical  applications  of 
nano-structures  cannot  be  achieved  by  approximate  analyses.  It  is  particularly  difficult  to  include 
thin  layers  (on  the  order  of  A/4)  of  dielectrics  on  one  or  both  faces  of  the  film,  rectangular  or 
elliptical  holes,  holes  in  combination  with  circular  corrugations,  asymmetric  arrays,  and  similar 
structures  that  are  of  practical  importance. 

The  finite  difference  time  domain  (FDTD)  is  the  best  suited  method  for  solution  of  these 
problems,  and  its  use  has  been  previously  reported  [3].  However,  for  many  structures  of  practical 
interest,  single  platform  codes  do  not  provide  sufficient  computational  resources.  In  this  contri¬ 
bution,  we  describe  a  new  hybrid  parallel  code  using  standard  MPI  and  OpenMP  technologies. 
An  example  of  the  application  is  also  given. 
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2  Parallel  FDTD  Algorithm 

2.1  Previous  Implementations 

Since  its  introduction  to  electromagnetics  by  Yee,  FDTD  has  become  very  popular  and  highly 
developed  [4].  Numerous  parallel  codes  have  implemented  FDTD  [5].  Most  older  parallel  codes 
have  been  targeted  at  specialized  vector  supercomputers,  which  are  very  expensive  compared  to 
the  clustering  approach  [6]  that  has  now  become  common.  Clusters  of  commodity  computers,  or 
nodes,  rely  on  a  high  speed  network  to  connect  them  together.  A  message  passing  system  allows 
the  nodes  in  the  cluster  to  exchange  information  as  needed  during  the  execution  of  an  algorithm. 
The  most  common  message  passing  system  is  Message  Passing  Interface  (MPI).  A  less  common, 
older  system  is  Parallel  Virtual  Machine  (PVM). 

The  FDTD  algorithm  is  well  suited  to  implementation  on  a  cluster.  Updating  a  component 
of  the  electric  or  magnetic  held  depends  only  on  the  previous  value  of  the  held  component  in  a 
given  cell  and  on  the  held  components  in  adjacent  cells.  The  FDTD  domain  can  be  decomposed 
into  a  number  of  sub-domains,  each  of  which  can  be  assigned  to  a  single  node.  Only  the  held 
components  at  the  boundary  between  two  sub-domains  need  to  be  passed  between  nodes,  once 
per  time  step.  This  allows  a  FDTD  code  using  MPI  to  increase  its  performance  by  processing 
multiple  parts  of  the  domain  in  parallel.  It  also  allows  much  larger  problems  to  be  simulated, 
since  the  domain  is  divided  into  smaller  parts  which  can  individually  ht  into  available  memory, 
where  the  domain  as  a  whole  may  be  too  large  for  a  single  machine. 

A  parallel  implementation  of  the  FDTD  algorithm  using  MPI  was  described  by  Guiddaut  and 
Mahdjoubi  [7].  Since  then,  a  number  of  other  codes  using  MPI  have  appeared.  Su  et  al.  described 
an  implementation  of  a  FDTD  using  a  OpenMP-MPI  parallel  hybrid  [8] .  OpenMP  is  an  standard 
for  shared  memory  parallelism  which  makes  it  straight  forward  to  write  code  which  can  take 
advantage  of  symmetrical  multiprocessors  (SMP).  Since  OpenMP  is  supported  by  a  number  of 
major  vendors,  it  is  possible  to  write  portable  C/C++  or  FORTRAN  code  which  can  be  used 
on  a  number  of  different  systems  or  compilers  with  little  or  no  change.  Using  OpenMP  and  MPI 
together  allow  for  significant  flexibility,  in  that  the  code  can  be  run  on  a  cluster  which  has  a 
number  of  single  or  multiple  processor  nodes,  and  on  large  shared  memory  systems  such  as  the 
SGI  Origin  line. 

2.2  Current  Implementation 

The  application  of  FDTD  to  optics  requires  that  the  following  algorithms  be  implemented: 

•  Perfectly  Matched  Layer  (PML)  absorbing  boundaries  to  terminate  the  mesh. 

•  Wide-band  plane  wave  excitation  source,  and  a  spatial  two-dimensional  Gaussian  excitation. 

•  Lossy  dielectrics  described  by  dielectric  constant  and  static  conductivity. 

•  Drude  plasma  to  characterize  metals  in  the  optical  regime  [9]. 

•  Discrete  Fourier  Transform  to  extract  frequency  domain  data  from  the  time  variation  of 
fields. 

•  Scattering  parameters  extraction. 

•  Near-to-far-field  transformation  to  obtain  the  transmittance  and  radiation  patterns. 

The  OpenMP-MPI  hybrid  parallel  FDTD  algorithm  is  implemented  as  follows: 


1.  Divide  FDTD  problem  domain  into  N  sub-domains,  where  N  is  the  number  of  nodes  avail¬ 
able  to  MPI.  Since  each  node  has  its  own  memory,  each  sub-domain  may  be  thought  of  as 
ranging  from  (0,  0,  0)  to  (. X ,  Y,  Z ). 

2.  Divide  each  of  the  N  sub-domains  into  M  chunks,  where  M  is  the  number  of  processors 
available  in  each  node. 

3.  Perform  time  stepping  updates  of  the  electric  field  components  on  each  of  the  N  nodes: 

for  i  =  (m-l)*X/M  to  m*X/M 
for  j  =  0  to  Y 
for  k  =  0  to  Z 

ex(i, j ,k)  =  ex(i,j,k)  +  ... 
ey(i, j ,k)  =  ey(i,j,k)  +  ... 
ez(i, j ,k)  =  ez(i, j ,k)  +  ... 
end 
end 
end 

The  value  of  m  ranges  from  1  to  M.  The  first  line  of  this  pseudo-code  shows  how  the  update 
loops  can  be  divided  up  among  M  processors  using  OpenMP.  The  complexity  shown  on  the 
first  line  is  handled  by  OpenMP. 

4.  Apply  electric  field  boundary  conditions  and  excitations.  Data  exchange  between  nodes 
using  MPI  is  treated  as  a  boundary  condition  and  is  handled  in  this  step. 

5.  Perform  time  stepping  updates  of  the  magnetic  field  components  on  each  of  the  N  nodes. 

6.  Apply  magnetic  held  boundary  conditions  and  excitations,  including  MPI  data  exchange 
between  nodes. 

7.  Repeat  from  3. 

3  Results 

Degiron  et  al.  have  reported  experimental  evidence  of  surface  plasmon  enhanced  transmission 
through  a  single  isolated  aperture  in  a  thin  silver  him  [10].  This  problem  is  a  good  test  case  for 
the  code  because  it  is  relatively  small  and  can  be  evaluated  quickly.  Furthermore,  it  exercises 
critical  parts  of  the  code  such  as  the  Drude  plasma  model.  Also,  published  experimental  results 
are  available  for  comparison  with  the  simulations.  The  simulation  set  up  consists  of  a  200x200x200 
computational  domain  where  the  Yee  cells  are  5  nm  along  each  dimension.  The  region  is  termi¬ 
nated  by  PML,  and  is  excited  by  a  Gauss  modulated  sine  wave  centered  at  500  THz.  Simulations 
were  run  for  a  square  hole  in  a  300  nm  thick  silver  him,  where  one  dimension  of  the  hole  was  hxed 
at  270  nm  and  the  other  dimension  was  different  for  each  simulation.  Each  simulation  was  run 
for  3708  time  steps,  with  a  time  step  of  8.66e-18  seconds. 

The  execution  times  for  various  numbers  of  CPUs  and  nodes  on  an  IBM  RS/6000  SP  with  8 
nodes,  each  containing  16  375-MHz  CPUs,  are  summarized  in  table  1. 


fh  CPUs  per  node 

jf  nodes 

Runtime  (seconds) 

1 

1 

15180 

16 

1 

4500 

16 

2 

821 

Table  1:  Simulation  execution  times 


4  Conclusion 

A  new  parallel  hybrid  FDTD  code  using  MPI  and  OpenMP  technologies  has  been  introduced. 
This  code  is  significantly  faster  when  run  on  a  parallel  computer  than  when  only  one  CPU  is 
available.  The  use  of  MPI  allows  for  the  simulation  of  very  large  problems  by  distributing  the 
problem  across  multiple  machines  in  addition  to  speeding  the  execution  of  the  simulation  by  using 
more  CPUs.  The  use  of  OpenMP  makes  it  possible  to  utilize  multiple  CPUs,  which  share  a  single 
system  image  through  shared  memory.  Together,  these  technologies  enable  the  simulation  of  very 
large  problems  such  as  those  involving  thin  layers  of  dielectrics,  rectangular  or  elliptical  holes, 
asymmetric  arrays  and  other  structures  of  practical  importance  is  possible. 
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